Plan

Définition du problème

Compréhension du métier

Compréhension des données

Préparation des données

Formule de Scoring

Evaluation

Déploiement

Définition du problème

Afin de gagner la confiance des clients et d’attirer de nouveaux abonnés, la compagnie d’assurance canadienne AssuExperts Inc doit mettre en place une nouvelle politique marketing adaptée au comportement de ses clients actuels et potentiels. Pour ce faire, l’entreprise doit comprendre ses clients. De plus, il est important d’identifier un profil de client (intéressé / non intéressé) qui constituerait une solution pour déterminer une cible marketing spécifique. Cela non seulement augmentera l’efficacité de la campagne, mais réduira également les coûts en évitant les contacts inutiles.

Compréhension du métier

Notre mission est de savoir qui sont les clients actuels et potentiels de A ssur E xperts INC , et quels sont leurs degrés d’appétence pour souscrire à une police d’assurance sur caravan.

Pour ce faire, nous allons travailler sur les 3 volets suivants:

Compréhension des données

Importation du jeu des données “AssurancExpertsInc.txt”

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
AssurancExpertsInc <- read.delim("~/Documents/lab2/AssurancExpertsInc.txt")
dim_desc(AssurancExpertsInc)
## [1] "[9,822 x 87]"
head(AssurancExpertsInc)
##   SD1 SD2 SD3 SD4 SD5 SD6 SD7 SD8 SD9 SD10 SD11 SD12 SD13 SD14 SD15 SD16
## 1  33   1   3   2   8   0   5   1   3    7    0    2    1    2    6    1
## 2  37   1   2   2   8   1   4   1   4    6    2    2    0    4    5    0
## 3  37   1   2   2   8   0   4   2   4    3    2    4    4    4    2    0
## 4   9   1   3   3   3   2   3   2   4    5    2    2    2    3    4    3
## 5  40   1   4   2  10   1   4   1   4    7    1    2    2    4    4    5
## 6  23   1   2   1   5   0   5   0   5    0    6    3    3    5    2    0
##   SD17 SD18 SD19 SD20 SD21 SD22 SD23 SD24 SD25 SD26 SD27 SD28 SD29 SD30
## 1    2    7    1    0    1    2    5    2    1    1    2    6    1    1
## 2    5    4    0    0    0    5    0    4    0    2    3    5    0    2
## 3    5    4    0    0    0    7    0    2    0    5    0    4    0    7
## 4    4    2    4    0    0    3    1    2    3    2    1    4    0    5
## 5    4    0    0    5    4    0    0    0    9    0    0    0    0    4
## 6    5    4    2    0    0    4    2    2    2    2    2    4    2    9
##   SD31 SD32 SD33 SD34 SD35 SD36 SD37 SD38 SD39 SD40 SD41 SD42 SD43 PO44
## 1    8    8    0    1    8    1    0    4    5    0    0    4    3    0
## 2    7    7    1    2    6    3    2    0    5    2    0    5    4    2
## 3    2    7    0    2    9    0    4    5    0    0    0    3    4    2
## 4    4    9    0    0    7    2    1    5    3    0    0    4    4    0
## 5    5    6    2    1    5    4    0    0    9    0    0    6    3    0
## 6    0    5    3    3    9    0    5    2    3    0    0    3    3    0
##   PO45 PO46 PO47 PO48 PO49 PO50 PO51 PO52 PO53 PO54 PO55 PO56 PO57 PO58
## 1    0    0    6    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 3    0    0    6    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    6    0    0    0    0    0    0    0    0    0    0    0
## 5    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 6    0    0    6    0    0    0    0    0    0    0    0    0    0    0
##   PO59 PO60 PO61 PO62 PO63 PO64 PO65 PO66 PO67 PO68 PO69 PO70 PO71 PO72
## 1    5    0    0    0    0    0    0    0    0    1    0    0    0    0
## 2    2    0    0    0    0    0    2    0    0    0    0    0    0    0
## 3    2    0    0    0    0    0    1    0    0    1    0    0    0    0
## 4    2    0    0    0    0    0    0    0    0    1    0    0    0    0
## 5    6    0    0    0    0    0    0    0    0    0    0    0    0    0
## 6    0    0    0    0    0    0    0    0    0    1    0    0    0    0
##   PO73 PO74 PO75 PO76 PO77 PO78 PO79 PO80 PO81 PO82 PO83 PO84 PO85 CLASS
## 1    0    0    0    0    0    0    0    1    0    0    0    0    0    No
## 2    0    0    0    0    0    0    0    1    0    0    0    0    0    No
## 3    0    0    0    0    0    0    0    1    0    0    0    0    0    No
## 4    0    0    0    0    0    0    0    1    0    0    0    0    0    No
## 5    0    0    0    0    0    0    0    1    0    0    0    0    0    No
## 6    0    0    0    0    0    0    0    0    0    0    0    0    0    No
##     STATUS
## 1 Learning
## 2 Learning
## 3 Learning
## 4 Learning
## 5 Learning
## 6 Learning

On a un jeu de données qui est composé de 87 variables et 9822 observations.

Les caractéristiques recueillies des clients sont partagées en deux familles de variables comme suit : - 43 attributs sociodémographiques : SD1, …SD43 - 42 attributs propriétaire de produit : PO44,…PO85 - L’attribut CLASS représente la réponse des clients interrogés. La valeur YES correspond à ceux qui désirent avoir des maisons mobiles. - La colonne STATUS indique si l’observation est destinée à la phase d’apprentissage ou à la phase de test.

Pour une meilleure visualisation des données, nous avons remplacé les noms des colonnes et les valeurs chiffrés par leurs descriptions respectives.

Changement des noms des colonnes et décodage de leurs valeurs

colnames <- read.delim("~/Documents/lab2/colnames.txt")
head(colnames)
##                   Columname
## 1          Customer Subtype
## 2 MAANTHUI Number of houses
## 3        Avg size household
## 4                   Avg age
## 5        Customer main type
## 6            Roman catholic
names(AssurancExpertsInc)=colnames$Columname
#L0
L0 <- read.csv2("~/Documents/lab2/L0.txt")
print(nrow(AssurancExpertsInc))
## [1] 9822
AssurancExpertsInc[,"Customer Subtype"]<- L0[match(AssurancExpertsInc[,"Customer Subtype"],L0$id),3]

#L1
L1 <- read.csv("~/Documents/lab2/L1.txt", header=FALSE, comment.char="#")
AssurancExpertsInc[,"Avg age"]<- L1[match(AssurancExpertsInc[,"Avg age"],L1$V1),2]

#L2
L2 <- read.csv("~/Documents/lab2/L2.txt", header=FALSE, comment.char="#")
AssurancExpertsInc[,"Customer main type"]<- L2[match(AssurancExpertsInc[,"Customer main type"],L2$V1),2]


#L3
for (i in which(colnames(AssurancExpertsInc)=="Roman catholic"):which(colnames(AssurancExpertsInc)=="Purchasing power class")){
  AssurancExpertsInc[,i] <- factor(AssurancExpertsInc[,i],
                   levels=c(0:9),
                   labels=c("0%",
                            "1-10%",
                            "11-23%",
                            "24-36%",
                            "37-49%",
                            "50-62%",
                            "63-75%",
                            "76-88%",
                            "89-99%",
                            "100%"),ordered = TRUE)
   AssurancExpertsInc[,i] <- as.ordered(AssurancExpertsInc[,i] )
}
#L4

for (i in which(colnames(AssurancExpertsInc)=="Contribution private third party insurance"):which(colnames(AssurancExpertsInc)=="Contribution social security insurance policies")){
  AssurancExpertsInc[,i] <- factor(AssurancExpertsInc[,i],
                   levels=c(0:9),
                   labels=c("0",
                            "1",
                            "50",
                            "100",
                            "200",
                            "500",
                            "1000",
                            "5000",
                            "10,000",
                            "20,000"))
   AssurancExpertsInc[,i] <- as.ordered(AssurancExpertsInc[,i])
 }
str(AssurancExpertsInc)
## 'data.frame':    9822 obs. of  87 variables:
##  $ Customer Subtype                                : Factor w/ 40 levels "Affluent senior apartments",..: 16 21 21 22 12 37 14 16 16 7 ...
##  $ MAANTHUI Number of houses                       : int  1 1 1 1 1 1 2 1 1 2 ...
##  $ Avg size household                              : int  3 2 2 3 4 2 3 2 2 3 ...
##  $ Avg age                                         : Factor w/ 6 levels "20-30 years",..: 2 2 2 3 2 1 2 3 4 3 ...
##  $ Customer main type                              : Factor w/ 10 levels " Farmers","Average Family",..: 7 7 7 2 1 8 4 7 7 2 ...
##  $ Roman catholic                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 2 1 3 2 1 3 1 1 4 ...
##  $ Protestant                                      : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 6 5 5 4 5 6 3 8 2 6 ...
##  $ Other religion                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 2 3 3 2 1 1 1 4 1 ...
##  $ No religion                                     : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 4 5 5 5 5 6 6 3 7 3 ...
##  $ Married                                         : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 8 7 4 6 8 1 8 8 7 8 ...
##  $ Living together                                 : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 3 3 3 2 7 3 3 1 1 ...
##  $ Other relation                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 3 5 3 3 4 1 1 4 3 ...
##  $ Singles                                         : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 5 3 3 4 1 1 4 3 ...
##  $ Household without children                      : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 5 5 4 5 6 4 6 4 3 ...
##  $ Household with children                         : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 7 6 3 5 5 3 7 5 4 7 ...
##  $ High level education                            : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 1 4 6 1 1 1 1 1 ...
##  $ Medium level education                          : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 6 6 5 5 6 5 4 2 5 ...
##  $ Lower level education                           : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 8 5 5 3 1 5 6 7 9 6 ...
##  $ High status                                     : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 1 5 1 3 1 3 2 3 ...
##  $ Entrepreneur                                    : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 1 1 1 6 1 1 1 2 1 ...
##  $ Farmer                                          : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 1 1 5 1 1 1 1 1 ...
##  $ Middle management                               : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 6 8 4 1 5 5 3 2 4 ...
##  $ Skilled labourers                               : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 6 1 1 2 1 3 2 6 9 4 ...
##  $ Unskilled labourers                             : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 5 3 3 1 3 6 3 2 4 ...
##  $ Social class A                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 1 4 10 3 1 3 2 2 ...
##  $ Social class B1                                 : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 3 6 3 1 3 2 2 2 3 ...
##  $ Social class B2                                 : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 3 4 1 2 1 3 5 3 1 2 ...
##  $ Social class C                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 7 6 5 5 1 5 6 6 9 5 ...
##  $ Social class D                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 1 1 1 1 3 1 3 2 3 ...
##  $ Rented house                                    : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 3 8 6 5 10 7 1 10 1 ...
##  $ Home owners                                     : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 9 8 3 5 6 1 4 10 1 10 ...
##  $ 1 car                                           : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 9 8 8 10 7 6 9 5 6 7 ...
##  $ 2 cars                                          : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 2 1 1 3 4 1 5 3 2 ...
##  $ No car                                          : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 3 3 1 2 4 2 3 4 3 ...
##  $ National Health Service                         : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 9 7 10 8 6 10 10 7 8 7 ...
##  $ Private health insurance                        : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 2 4 1 3 5 1 1 4 3 4 ...
##  $ Income < 30.000                                 : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 3 5 2 1 6 5 3 8 3 ...
##  $ Income 30-45.000                                : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 5 1 6 6 1 3 4 6 3 4 ...
##  $ Income 45-75.000                                : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 6 6 1 4 10 4 4 4 2 4 ...
##  $ Income 75-122.000                               : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 3 1 1 1 1 1 1 1 2 ...
##  $ Income >123.000                                 : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Average income                                  : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 5 6 4 5 7 4 4 4 3 5 ...
##  $ Purchasing power class                          : Ord.factor w/ 10 levels "0%"<"1-10%"<"11-23%"<..: 4 5 5 5 4 4 6 4 4 8 ...
##  $ Contribution private third party insurance      : Ord.factor w/ 4 levels "0"<"1"<"50"<"100": 1 3 3 1 1 1 1 1 1 3 ...
##  $ Contribution third party insurance (firms) ...  : Ord.factor w/ 7 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution third party insurane (agriculture) : Ord.factor w/ 5 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution car policies                       : Ord.factor w/ 7 levels "0"<"200"<"500"<..: 4 1 4 4 1 4 4 1 3 1 ...
##  $ Contribution delivery van policies              : Ord.factor w/ 4 levels "0"<"500"<"1000"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution motorcycle/scooter policies        : Ord.factor w/ 6 levels "0"<"100"<"200"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution lorry policies                     : Ord.factor w/ 5 levels "0"<"200"<"1000"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution trailer policies                   : Ord.factor w/ 6 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution tractor policies                   : Ord.factor w/ 6 levels "0"<"100"<"200"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution agricultural machines policies     : Ord.factor w/ 6 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution moped policies                     : Ord.factor w/ 6 levels "0"<"50"<"100"<..: 1 1 1 1 1 1 1 3 1 1 ...
##  $ Contribution life insurances                    : Ord.factor w/ 10 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution private accident insurance policies: Ord.factor w/ 7 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution family accidents insurance policies: Ord.factor w/ 3 levels "0"<"50"<"100": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution disability insurance policies      : Ord.factor w/ 5 levels "0"<"200"<"500"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution fire policies                      : Ord.factor w/ 9 levels "0"<"1"<"50"<"100"<..: 6 3 3 3 7 1 1 1 1 4 ...
##  $ Contribution surfboard policies                 : Ord.factor w/ 4 levels "0"<"1"<"50"<"100": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution boat policies                      : Ord.factor w/ 7 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution bicycle policies                   : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution property insurance policies        : Ord.factor w/ 7 levels "0"<"1"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contribution social security insurance policies : Ord.factor w/ 5 levels "0"<"50"<"100"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Number of private third party insurance 1 - 12  : int  0 2 1 0 0 0 0 0 0 1 ...
##  $ Number of third party insurance (firms) ...     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of third party insurane (agriculture)    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of car policies                          : int  1 0 1 1 0 1 1 0 1 0 ...
##  $ Number of delivery van policies                 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of motorcycle/scooter policies           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of lorry policies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of trailer policies                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of tractor policies                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of agricultural machines policies        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of moped policies                        : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Number of life insurances                       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of private accident insurance policies   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of family accidents insurance policies   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of disability insurance policies         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of fire policies                         : int  1 1 1 1 1 0 0 0 0 1 ...
##  $ Number of surfboard policies                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of boat policies                         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of bicycle policies                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of property insurance policies           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Number of social security insurance policies    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CLASS                                           : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATUS                                          : Factor w/ 2 levels "Learning","Test": 1 1 1 1 1 1 1 1 1 1 ...

On obtient 23 variables quantitatives et 64 variables qualitatives.

Visualisation des données

Répartition de la modalité CLASS (nombre de clients intéressés/non intéressés )

data<-AssurancExpertsInc
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
classLabelFreq <- data.frame(data$CLASS)
classLabelFreq$data.CLASS <- as.factor(data$CLASS)
fig1 = table(data$CLASS)
df1 = data.frame(fig1)
p <- plot_ly(
  x = df1$Var1,
  y = df1$Freq,
  name = "CARAVAN",
  type = "bar"
)
p

On peut intérpreter que le nombre de clients non intéressés par l’assurance de caravane sont beaucoup plus nombreux que ceux qui sont intéressés.

Visualisation de la répartition de CLASS par rapport les autres variables

Customer Main Type par rapport à la CLASS

library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
df2 = sqldf("SELECT s.`Customer main type`,(SELECT count(*) FROM data d1 WHERE CLASS='Yes' and s.`Customer main type` = d1.`Customer main type`) as 'NumberYes',(SELECT count(*) FROM data d2 WHERE CLASS='No' and s.`Customer main type`=d2.`Customer main type`) as 'NumberNo' FROM data s GROUP BY s.`Customer main type`")
CustomerMainType <- df2$`Customer main type`
Yes <- df2$NumberYes
No <- df2$NumberNo
datadf2 <- data.frame(CustomerMainType, Yes, No)
p <- plot_ly(datadf2, x = ~CustomerMainType, y = ~Yes, type = 'bar', name = 'Yes') %>%
  add_trace(y = ~No, name = 'No') %>%
  layout(yaxis = list(title = 'Count'), barmode = 'stack')
p
#AssurancExpertsInc$`Customer main type`

On remarque que la classe modale est ‘Family with grown ups’, cette classe comporte aussi le nombre le plus important des votes des clients intéressés par l’assurance du caravane.

Average Age par rapport à la CLASS

df2 = sqldf("SELECT s.`Avg age`,(SELECT count(*) FROM data d1 WHERE CLASS='Yes' and s.`Avg age` = d1.`Avg age`) as 'NumberYes',(SELECT count(*) FROM data d2 WHERE CLASS='No' and s.`Avg age`=d2.`Avg age`) as 'NumberNo' FROM data s GROUP BY s.`Avg age`")
Avgage <- df2$`Avg age`
Yes <- df2$NumberYes
No <- df2$NumberNo
datadf2 <- data.frame(Avgage, Yes, No)
p <- plot_ly(datadf2, x = ~Avgage, y = ~Yes, type = 'bar', name = 'Yes') %>%
  add_trace(y = ~No, name = 'No') %>%
  layout(yaxis = list(title = 'Count'), barmode = 'stack')
p

On constate ici que les clients intéressés par l’assurance appartiennent à la tranche d’age entre 30 et 60.

Average Income par rapport à la CLASS

df2 = sqldf("SELECT s.`Average income`,(SELECT count(*) FROM data d1 WHERE CLASS='Yes' and s.`Average income` = d1.`Average income`) as 'NumberYes',(SELECT count(*) FROM data d2 WHERE CLASS='No' and s.`Average income`=d2.`Average income`) as 'NumberNo' FROM data s GROUP BY s.`Average income`")
Averageincome <- df2$`Average income`
Yes <- df2$NumberYes
No <- df2$NumberNo
datadf2 <- data.frame(Averageincome, Yes, No)
p <- plot_ly(datadf2, x = ~Averageincome, y = ~Yes, type = 'bar', name = 'Yes') %>%
  add_trace(y = ~No, name = 'No') %>%
  layout(yaxis = list(title = 'Count'), barmode = 'stack')
p

On constate que les clients intéressés par l’assurance est concentrée au niveau de 24 à 62% qui se caractérise par un revenue moyen.

Vu le nombre de variables énorme dans notre jeu de données, on va limiter notre visualisation seulement au niveau des variables qui présentent une concentration immportante des clients dans la classe modale.

db1=AssurancExpertsInc[which(AssurancExpertsInc$CLASS=="Yes"),]
for(i in 1:85){
maxi=max(table(db1[,i]))
if (maxi>400){
  print(names(db1)[i])
}
}
## [1] "MAANTHUI Number of houses"
## [1] "Farmer"
## [1] "Income >123.000"
## [1] "Contribution third party insurance (firms) ..."
## [1] "Contribution third party insurane (agriculture)"
## [1] "Contribution car policies"
## [1] "Contribution delivery van policies"
## [1] "Contribution motorcycle/scooter policies"
## [1] "Contribution lorry policies"
## [1] "Contribution trailer policies"
## [1] "Contribution tractor policies"
## [1] "Contribution agricultural machines policies "
## [1] "Contribution moped policies"
## [1] "Contribution life insurances"
## [1] "Contribution private accident insurance policies"
## [1] "Contribution family accidents insurance policies"
## [1] "Contribution disability insurance policies"
## [1] "Contribution surfboard policies"
## [1] "Contribution boat policies"
## [1] "Contribution bicycle policies"
## [1] "Contribution property insurance policies"
## [1] "Contribution social security insurance policies"
## [1] "Number of third party insurance (firms) ..."
## [1] "Number of third party insurane (agriculture)"
## [1] "Number of delivery van policies"
## [1] "Number of motorcycle/scooter policies"
## [1] "Number of lorry policies"
## [1] "Number of trailer policies"
## [1] "Number of tractor policies"
## [1] "Number of agricultural machines policies"
## [1] "Number of moped policies"
## [1] "Number of life insurances"
## [1] "Number of private accident insurance policies"
## [1] "Number of family accidents insurance policies"
## [1] "Number of disability insurance policies"
## [1] "Number of surfboard policies"
## [1] "Number of boat policies"
## [1] "Number of bicycle policies"
## [1] "Number of property insurance policies"
## [1] "Number of social security insurance policies"

Income >123.000 par rapport à la CLASS

df2 = sqldf("SELECT s.`Income >123.000`,(SELECT count(*) FROM data d1 WHERE CLASS='Yes' and s.`Income >123.000` = d1.`Income >123.000`) as 'NumberYes',(SELECT count(*) FROM data d2 WHERE CLASS='No' and s.`Income >123.000`=d2.`Income >123.000`) as 'NumberNo' FROM data s GROUP BY s.`Income >123.000`")
Income..123.000 <- df2$`Income >123.000`
Yes <- df2$NumberYes
No <- df2$NumberNo
datadf2 <- data.frame(Income..123.000, Yes, No)
p <- plot_ly(datadf2, x = ~Income..123.000, y = ~Yes, type = 'bar', name = 'Yes') %>%
  add_trace(y = ~No, name = 'No') %>%
  layout(yaxis = list(title = 'Count'), barmode = 'stack')
p

Comme on a déjà intérpreté auparavant, les clients les plus intéressés ne sont pas généralement caractérisés d’un revenue trés élevé (>123.000)

Préparation des données

Vu le nombre énorme de variables, on a utilisé une selection de paramètres les plus importants pour déterminer l’intéret des clients au assurance de caravane.

On a utilisé 2 méthodes : “Boruta” qui nous a réduit nos 85 variables en 32 variables seulement, et “VarImp” qui a réduit ces variables en 20 variables importantes.

L’union de ces deux méthodes a donné un jeu de données de 44 variables.

Formule de Scoring

FinalData <- read.csv("~/Downloads/test.csv", row.names = 1)
db<-FinalData
db1=db[which(db$CLASS=="Yes"),]
db2=db[which(db$CLASS=="No"),]


#calcule weights Yes
weightsYes=data.frame(row.names = names(db))
for (i in 1:42){
  weight=array(table(db1[,i])/586)
for (j in 1:length(weight)){
  weightsYes[i,j]=weight[j]
}
}
weightsYes[is.na(weightsYes)]<-0

#calcule weights no


weightsNo=data.frame(row.names = names(db))
for (i in 1:42){
  weight=array(table(db2[,i])/9236)
  for (j in 1:length(weight)){
    weightsNo[i,j]=weight[j]
  }
}
weightsNo[is.na(weightsNo)]<-0

#clacule score yes

db$scoreYes<-NA
db$scoreYes=as.numeric(db$scoreYes)
scoring=c()
for(i in seq(1,9822)){
  score<-0
  for (j in seq(1,42)){
    if (is.numeric(db[,j])){
      ind<-db[i,j]
      score<-sum(score,as.numeric(weightsYes[j,ind]))
    }
    else {
      ind<-match(db[i,j],levels(db[,j]))
      score<-sum(score,as.numeric(weightsYes[j,ind]))
    }

  }
db[i,45]<-score
}

#clacule score no
db$scoreNo<-NA
db$scoreNo=as.numeric(db$scoreNo)
scoring=c()
for(i in seq(1,9822)){
  score<-0
  for (j in seq(1,42)){
    if (is.numeric(db[,j])){
      ind<-db[i,j]
      score<-sum(score,as.numeric(weightsNo[j,ind]))
    }
    else {
      ind<-match(db[i,j],levels(db[,j]))
      score<-sum(score,as.numeric(weightsNo[j,ind]))
    }
    
  }
  db[i,46]<-score
}
#calcule score 
db$score<-db$scoreYes-db$scoreNo
plot(db$CLASS,db$score)

On peut intérpreter que les scores des clients intéressés sont concentrés proches de la valeur 0, par contre les clients non intéressés ont des scores proches de -0.5.

Evaluation

nrow(db[which(db$score>0),])
## [1] 2045
head(db[which(db$score>0),])
##                   Customer.Subtype MAANTHUI.Number.of.houses     Avg.age
## 2        Mixed small town dwellers                         1 30-40 years
## 11                   Stable family                         1 40-50 years
## 17            Traditional families                         1 40-50 years
## 19       Young all american family                         1 30-40 years
## 23       Young all american family                         1 30-40 years
## 37 Dinki's (double income no kids)                         1 30-40 years
##    Married Living.together Singles Household.without.children
## 2   63-75%          11-23%      0%                     37-49%
## 11  76-88%           1-10%      0%                     24-36%
## 17  76-88%              0%      0%                     63-75%
## 19  76-88%              0%   1-10%                     24-36%
## 23  76-88%              0%   1-10%                     24-36%
## 37  76-88%          11-23%      0%                     63-75%
##    Household.with.children High.level.education Medium.level.education
## 2                   50-62%                   0%                 50-62%
## 11                  63-75%               37-49%                 24-36%
## 17                  24-36%               11-23%                 63-75%
## 19                  63-75%               50-62%                 37-49%
## 23                  63-75%               24-36%                 50-62%
## 37                  24-36%               11-23%                 76-88%
##    Lower.level.education High.status Entrepreneur Farmer
## 2                 37-49%          0%           0%     0%
## 11                24-36%          0%           0%     0%
## 17                11-23%      11-23%           0%     0%
## 19                 1-10%      63-75%           0%     0%
## 23                 1-10%      63-75%           0%     0%
## 37                    0%      11-23%           0%     0%
##    Unskilled.labourers Social.class.A Social.class.B1 Social.class.D
## 2               37-49%             0%          11-23%             0%
## 11                  0%         24-36%              0%             0%
## 17              37-49%         11-23%          11-23%             0%
## 19               1-10%         50-62%          11-23%             0%
## 23               1-10%         63-75%           1-10%             0%
## 37                  0%         11-23%          24-36%             0%
##    Rented.house Home.owners X1.car X2.cars No.car National.Health.Service
## 2        11-23%      76-88% 76-88%   1-10% 11-23%                  63-75%
## 11           0%        100% 63-75%  11-23%  1-10%                  50-62%
## 17       63-75%      24-36% 76-88%  11-23%     0%                  37-49%
## 19       37-49%      50-62% 76-88%   1-10% 11-23%                  50-62%
## 23       37-49%      50-62% 76-88%  11-23%  1-10%                  50-62%
## 37           0%        100% 76-88%  11-23%     0%                  76-88%
##    Private.health.insurance Income...30.000 Income.30.45.000
## 2                    24-36%          11-23%               0%
## 11                   37-49%              0%           24-36%
## 17                   50-62%              0%            1-10%
## 19                   37-49%          11-23%           11-23%
## 23                   37-49%           1-10%            1-10%
## 37                   11-23%              0%           63-75%
##    Income.45.75.000 Income.75.122.000 Income..123.000 Average.income
## 2            50-62%            11-23%              0%         50-62%
## 11           11-23%            11-23%          11-23%         89-99%
## 17           63-75%            11-23%              0%         50-62%
## 19           63-75%                0%              0%         37-49%
## 23           89-99%                0%              0%         50-62%
## 37           24-36%                0%              0%         37-49%
##    Purchasing.power.class Contribution.private.third.party.insurance
## 2                  37-49%                                          3
## 11                 76-88%                                          1
## 17                 37-49%                                          1
## 19                 63-75%                                          3
## 23                 63-75%                                          3
## 37                 63-75%                                          3
##    Contribution.car.policies Contribution.delivery.van.policies
## 2                          1                                  1
## 11                         7                                  1
## 17                         1                                  1
## 19                         6                                  1
## 23                         7                                  1
## 37                         7                                  1
##    Contribution.life.insurances Contribution.fire.policies
## 2                             1                          3
## 11                            1                          1
## 17                            1                          1
## 19                            1                          3
## 23                            1                          5
## 37                            1                          5
##    Contribution.surfboard.policies
## 2                                1
## 11                               1
## 17                               1
## 19                               1
## 23                               1
## 37                               1
##    Number.of.private.third.party.insurance.1...12 Number.of.car.policies
## 2                                               2                      0
## 11                                              0                      1
## 17                                              0                      0
## 19                                              1                      1
## 23                                              1                      1
## 37                                              1                      2
##    Number.of.life.insurances Number.of.fire.policies CLASS   STATUS
## 2                          0                       1    No Learning
## 11                         0                       0    No Learning
## 17                         0                       0    No Learning
## 19                         0                       1    No Learning
## 23                         0                       1    No Learning
## 37                         0                       1    No Learning
##    scoreYes  scoreNo      score
## 2  13.49488 13.43948 0.05540458
## 11 12.77304 12.57936 0.19367418
## 17 13.04437 12.90699 0.13737423
## 19 13.74573 13.62181 0.12392781
## 23 13.53584 13.15180 0.38403886
## 37 15.32765 13.99556 1.33208420

On a constaté d’après nos résultats qu’il ya eu des clients parmi ceux qui ont montré un intérêt à l’achat d’une caravane lors du sondage qui ne sont pas forcement intéressés. En revanche, Pour autres clients ayant repondu ‘no’, il s’est avéré qu’ils peuvent être intéressés.

Le concept de sondage ne décrit pas réellement le degré d’intéret des clients au assurance de caravane, car il se déroule d’une manière rapide et alétoire.

Donc les clients qui ont répondu ‘no’ et qui sont intéressés d’après nos résultats sont les clients cibles et vont probablement etre convaincus à condition de mettre en place la stratégie de Marketing convenable.

Déploiment

Le résultat de ce projet est une formule de scoring basée sur les poids des modalités des variables sélectionnées.

Pour obtenir une liste des clients potentiels, on recommande cibler les clients ayant un score positif.